Fairbanks
FaStfact: Faster, Stronger Long-Form Factuality Evaluations in LLMs
Wan, Yingjia, Tan, Haochen, Zhu, Xiao, Zhou, Xinyu, Li, Zhiwei, Lv, Qingsong, Sun, Changxuan, Zeng, Jiaqi, Xu, Yi, Lu, Jianqiao, Liu, Yinhong, Guo, Zhijiang
Evaluating the factuality of long-form generations from Large Language Models (LLMs) remains challenging due to efficiency bottlenecks and reliability concerns. Prior efforts attempt this by decomposing text into claims, searching for evidence, and verifying claims, but suffer from critical drawbacks: (1) inefficiency due to overcomplicated pipeline components, and (2) ineffectiveness stemming from inaccurate claim sets and insufficient evidence. To address these limitations, we propose \textbf{FaStfact}, an evaluation framework that achieves the highest alignment with human evaluation and time/token efficiency among existing baselines. FaStfact first employs chunk-level claim extraction integrated with confidence-based pre-verification, significantly reducing the time and token cost while ensuring reliability. For searching and verification, it collects document-level evidence from crawled web-pages and selectively retrieves it during verification. Extensive experiments based on an annotated benchmark \textbf{FaStfact-Bench} demonstrate the reliability of FaStfact in both efficiently and effectively evaluating long-form factuality. Code, benchmark data, and annotation interface tool are available at https://github.com/Yingjia-Wan/FaStfact.
- North America > United States > New Jersey > Bergen County > Rutherford (0.14)
- North America > United States > Florida > Miami-Dade County > Miami (0.14)
- Europe > Austria > Vienna (0.14)
- (26 more...)
- Telecommunications (1.00)
- Leisure & Entertainment > Sports > Soccer (1.00)
- Information Technology (1.00)
- (4 more...)
On Thin Ice: Towards Explainable Conservation Monitoring via Attribution and Perturbations
Zhou, Jiayi, Aghakishiyeva, Günel, Arya, Saagar, Dale, Julian, Poling, James David, Houliston, Holly R., Womble, Jamie N., Larsen, Gregory D., Johnston, David W., Bent, Brinnae
Computer vision can accelerate ecological research and conservation monitoring, yet adoption in ecology lags in part because of a lack of trust in black-box neural-network-based models. We seek to address this challenge by applying post-hoc explanations to provide evidence for predictions and document limitations that are important to field deployment. Using aerial imagery from Glacier Bay National Park, we train a Faster R-CNN to detect pinnipeds (harbor seals) and generate explanations via gradient-based class activation mapping (HiResCAM, LayerCAM), local interpretable model-agnostic explanations (LIME), and perturbation-based explanations. We assess explanations along three axes relevant to field use: (i) localization fidelity: whether high-attribution regions coincide with the animal rather than background context; (ii) faithfulness: whether deletion/insertion tests produce changes in detector confidence; and (iii) diagnostic utility: whether explanations reveal systematic failure modes. Explanations concentrate on seal torsos and contours rather than surrounding ice/rock, and removal of the seals reduces detection confidence, providing model-evidence for true positives. The analysis also uncovers recurrent error sources, including confusion between seals and black ice and rocks. We translate these findings into actionable next steps for model development, including more targeted data curation and augmentation. By pairing object detection with post-hoc explainability, we can move beyond "black-box" predictions toward auditable, decision-supporting tools for conservation monitoring.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- Europe > Switzerland > Zürich > Zürich (0.14)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- (4 more...)
Photorealistic Inpainting for Perturbation-based Explanations in Ecological Monitoring
Aghakishiyeva, Günel, Zhou, Jiayi, Arya, Saagar, Dale, Julian, Poling, James David, Houliston, Holly R., Womble, Jamie N., Larsen, Gregory D., Johnston, David W., Bent, Brinnae
Ecological monitoring is increasingly automated by vision models, yet opaque predictions limit trust and field adoption. We present an inpainting-guided, perturbation-based explanation technique that produces photorealistic, mask-localized edits that preserve scene context. Unlike masking or blurring, these edits stay in-distribution and reveal which fine-grained morphological cues drive predictions in tasks such as species recognition and trait attribution. We demonstrate the approach on a YOLOv9 detector fine-tuned for harbor seal detection in Glacier Bay drone imagery, using Segment-Anything-Model-refined masks to support two interventions: (i) object removal/replacement (e.g., replacing seals with plausible ice/water or boats) and (ii) background replacement with original animals composited onto new scenes. Explanations are assessed by re-scoring perturbed images (flip rate, confidence drop) and by expert review for ecological plausibility and interpretability. The resulting explanations localize diagnostic structures, avoid deletion artifacts common to traditional perturbations, and yield domain-relevant insights that support expert validation and more trustworthy deployment of AI in ecology.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- Europe > Switzerland > Zürich > Zürich (0.14)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- (3 more...)
- Transportation > Air (0.47)
- Health & Medicine (0.46)
- Government (0.46)
Hybrid Physics-ML Framework for Pan-Arctic Permafrost Infrastructure Risk at Record 2.9-Million Observation Scale
Arctic warming threatens over $100 billion in permafrost-dependent infrastructure across Northern territories, yet existing risk assessment frameworks lack spatiotemporal validation, uncertainty quantification, and operational decision-support capabilities. W e present a hybrid physics-machine learning framework integrating 2.9 million observations from 171,605 locations (2005-2021) combining permafrost fraction data with climate reanalysis. Our stacked ensemble model (Random F orest + Histogram Gradient Boosting + Elastic Net) achieves R=0.980 (RMSE=5.01 pp) with rigorous spatiotemporal cross-validation preventing data leakage. T o address machine learning limitations in extrapolative climate scenarios, we develop a hybrid approach combining learned climate-permafrost relationships (60%) with physical permafrost sensitivity models (40%, -10 pp/C). Under RCP8.5 forcing (+5C over 10 years), we project mean permafrost fraction decline of -20.3 pp (median: -20.0 pp), with 51.5% of Arctic Russia experiencing over 20 percentage point loss. Infrastructure risk classification identifies 15% high-risk zones (25% medium-risk) with spatially explicit uncertainty maps. Our framework represents the largest validated permafrost ML dataset globally, provides the first operational hybrid physics-ML forecasting system for Arctic infrastructure, and delivers open-source tools enabling probabilistic permafrost projections for engineering design codes and climate adaptation planning. The methodology is generalizable to other permafrost regions and demonstrates how hybrid approaches can overcome pure data-driven limitations in climate change applications.
- Europe > Russia (0.25)
- Asia > Russia (0.25)
- Oceania > Australia > Northern Territory (0.24)
- (3 more...)
Fine-Scale Soil Mapping in Alaska with Multimodal Machine Learning
Lin, Yijun, Chen, Theresa, Brungard, Colby, Sabine, Grunwald, Ives, Sue, Macander, Matt, Nawrocki, Timm, Chiang, Yao-Yi, Jelinski, Nic
Fine-scale soil mapping in Alaska, traditionally relying on fieldwork and localized simulations, remains a critical yet underdeveloped task, despite the region's ecological importance and extensive permafrost coverage. As permafrost thaw accelerates due to climate change, it threatens infrastructure stability and key ecosystem services, such as soil carbon storage. High-resolution soil maps are essential for characterizing permafrost distribution, identifying vulnerable areas, and informing adaptation strategies. We present MISO, a vision-based machine learning (ML) model to produce statewide fine-scale soil maps for near-surface permafrost and soil taxonomy. The model integrates a geospatial foundation model for visual feature extraction, implicit neural representations for continuous spatial prediction, and contrastive learning for multimodal alignment and geo-location awareness. We compare MISO with Random Forest (RF), a traditional ML model that has been widely used in soil mapping applications. Spatial cross-validation and regional analysis across Permafrost Zones and Major Land Resource Areas (MLRAs) show that MISO generalizes better to remote, unseen locations and achieves higher recall than RF, which is critical for monitoring permafrost thaw and related environmental processes. These findings demonstrate the potential of advanced ML approaches for fine-scale soil mapping and provide practical guidance for future soil sampling and infrastructure planning in permafrost-affected landscapes. The project will be released at https://github.com/knowledge-computing/Peatland-permafrost.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.28)
- North America > United States > Florida > Alachua County > Gainesville (0.14)
- North America > United States > Alaska > Fairbanks North Star Borough > Fairbanks (0.14)
- (10 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.93)
Towards Large Reasoning Models for Agriculture
Zaremehrjerdi, Hossein, Ganguly, Shreyan, Rairdin, Ashlyn, Tranel, Elizabeth, Feuer, Benjamin, Di Salvo, Juan Ignacio, Panthulugiri, Srikanth, Pacin, Hernan Torres, Moser, Victoria, Jones, Sarah, Raigne, Joscif G, Shen, Yanben, Dornath, Heidi M., Balu, Aditya, Krishnamurthy, Adarsh, Singh, Asheesh K, Singh, Arti, Ganapathysubramanian, Baskar, Hegde, Chinmay, Sarkar, Soumik
Agricultural decision-making involves complex, context-specific reasoning, where choices about crops, practices, and interventions depend heavily on geographic, climatic, and economic conditions. Traditional large language models (LLMs) often fall short in navigating this nuanced problem due to limited reasoning capacity. We hypothesize that recent advances in large reasoning models (LRMs) can better handle such structured, domain-specific inference. To investigate this, we introduce AgReason, the first expert-curated open-ended science benchmark with 100 questions for agricultural reasoning. Evaluations across thirteen open-source and proprietary models reveal that LRMs outperform conventional ones, though notable challenges persist, with the strongest Gemini-based baseline achieving 36% accuracy. We also present AgThoughts, a large-scale dataset of 44.6K question-answer pairs generated with human oversight and equipped with synthetically generated reasoning traces. Using AgThoughts, we develop AgThinker, a suite of small reasoning models that can be run on consumer-grade GPUs, and show that our dataset can be effective in unlocking agricultural reasoning abilities in LLMs. Our project page is here: https://baskargroup.github.io/Ag_reasoning/
- North America > United States > Montana (0.14)
- North America > United States > North Dakota (0.14)
- North America > United States > Missouri (0.05)
- (43 more...)
- Materials > Chemicals > Agricultural Chemicals (1.00)
- Food & Agriculture > Agriculture > Pest Control (0.94)
- Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.67)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)
GroundHog: Revolutionizing GLDAS Groundwater Storage Downscaling for Enhanced Recharge Estimation in Bangladesh
Ahmed, Saleh Sakib, Zzaman, Rashed Uz, Jony, Saifur Rahman, Himel, Faizur Rahman, Sharmin, Afroza, Rahman, A. H. M. Khalequr, Rahman, M. Sohel, Nowreen, Sara
Long-term groundwater level (GWL) measurement is vital for effective policymaking and recharge estimation using annual maxima and minima. However, current methods prioritize short-term predictions and lack multi-year applicability, limiting their utility. Moreover, sparse in-situ measurements lead to reliance on low-resolution satellite data like GLDAS as the ground truth for Machine Learning models, further constraining accuracy. To overcome these challenges, we first develop an ML model to mitigate data gaps, achieving $R^2$ scores of 0.855 and 0.963 for maximum and minimum GWL predictions, respectively. Subsequently, using these predictions and well observations as ground truth, we train an Upsampling Model that uses low-resolution (25 km) GLDAS data as input to produce high-resolution (2 km) GWLs, achieving an excellent $R^2$ score of 0.96. Our approach successfully upscales GLDAS data for 2003-2024, allowing high-resolution recharge estimations and revealing critical trends for proactive resource management. Our method allows upsampling of groundwater storage (GWS) from GLDAS to high-resolution GWLs for any points independently of officially curated piezometer data, making it a valuable tool for decision-making.
- Asia > Bangladesh > Dhaka Division > Dhaka District > Dhaka (0.06)
- Asia > India > Maharashtra (0.04)
- Asia > China (0.04)
- (15 more...)
They wanted to save us from a dark AI future. Then six people were killed
Years before she became the peculiar central thread linking a double homicide in Pennsylvania, the fatal shooting of a federal agent in Vermont and the murder of an elderly landlord in California, a computer programmer bought a sailboat. The programmer was known to friends, foes and followers as Ziz. She had come to the San Francisco Bay Area in 2016 as part of an influx of young people arriving to study the dangers that artificial intelligence could pose to humanity. In one of the most expensive regions of the United States, however, it is difficult to save the world when you can't make rent. So she bought a boat for 600 and moored it next to a friend's vessel in a marina. For five years, she used it as an occasional, cramped bunk. In her waking hours, she worked on a blog of provocative and increasingly extreme ideas about confrontation and retaliation. At night, she fell asleep as the boat rocked back and forth, drifting with the flotsam of greater Silicon Valley. Then, on the night of 19 August 2022, her sister and a friend reported that they saw her fall overboard. The Coast Guard and local authorities scrambled boats and aircraft. After a nearly 30-hour search, neither Ziz nor her body could be found. A newspaper in Alaska, where she was born, published a short obituary referring to her by her birth name: "Jack Amadeus LaSota left our lives but not our hearts on Aug 19 after a boating accident. Loving adventure, friends and family, music, blueberries, biking, computer games and animals, you are missed." Ziz's ideas did not die in the waters of the California coast. She had faked her drowning and gone underground, before being arrested last month in western Maryland and charged with trespassing and illegal transportation of a firearm. The targets of Ziz's ire, who include some of Silicon Valley's most prominent intellectuals, have taken security precautions. "Ziz is not stupid," someone familiar with her, who asked to remain anonymous, told me. "This is a very smart person – both smart and crazy." Ziz's writing had polarized members of a niche but influential movement of AI theorists and tech bloggers who call themselves the "rationalists". The movement is less about specific ideas than it is about an ethos – applying rigorous, mathematically informed thinking to AI, philosophy, psychology and the big questions of our time. Rationalists are odd, though often charming, people. They tend to be fantasy and sci-fi geeks, use lots of jargon and think intensely about things other people barely think about at all.
- North America > United States > Vermont (0.25)
- North America > United States > Pennsylvania (0.25)
- North America > United States > California > San Francisco County > San Francisco (0.25)
- (11 more...)
- Law > Criminal Law (1.00)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
- Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)
Transgender cult leader linked to border agent killing maintains innocence, asks for vegan food in jail
Post Millennial senior editor Andy Ngo unpacks what led to the arrests of members of an apparent transgender vegan cult on'The Ingraham Angle.' The apparent head of a radical transgender cult linked to six killings, including a U.S. Border Patrol agent, told a Maryland judge last week, "I haven't done anything wrong" while pleading for access to vegan food behind bars. "I might starve to death if you cannot answer me," Jack Amadeus LaSota, 34, who goes by "Ziz," told Judge Erich Bean during a bail hearing in Allegany County District Court in Maryland on Feb. 18, according to audio obtained by the San Francisco Chronicle. "I need the jail to be ordered for me to have a vegan diet. It's more important than whatever this hearing is."
- North America > United States > Maryland (0.49)
- North America > United States > California > San Francisco County > San Francisco (0.25)
- North America > United States > Vermont (0.11)
- (4 more...)
- Information Technology > Communications (0.49)
- Information Technology > Artificial Intelligence > Robots (0.49)
Forecasting Local Ionospheric Parameters Using Transformers
Alford-Lago, Daniel J., Curtis, Christopher W., Ihler, Alexander T., Zawdie, Katherine A., Drob, Douglas P.
Accurate and efficient modeling of Earth's ionosphere has a significant impact on research and operational communities due to its effects on radio communications, radar performance, [1, 2, 3] and satellite drag [4]. Success in forecasting key parameters such as the F2 layer critical frequency (foF2) and height (hmF2) and the total electron content (TEC) allows one to anticipate and mitigate the impacts of ionospheric variability on such systems. Over the past decades, many modeling approaches have been developed to predict these ionospheric parameters with increasing accuracy and skill. These models may be broadly categorized as empirical, physics-based, and, more recently, machine learning methods. Empirical models often rely on extensive historical datasets to establish statistical relationships between ionospheric parameters and geophysical variables. The International Reference Ionosphere (IRI) model [5] is a widely used standard that provides monthly averages of various ionospheric parameters based on many decades of past observations. IRI has seen continual development and improvements over the years, adding a host of submodels used to capture specific aspects of the ionosphere such as the CCIR [6, 7] and URSI [8] foF2 models for representing the diurnal variations of the peak plasma density across the globe, the AMTB [9] and SHU-2015 [10] models for even more harmonic expansions of hmF2, and NeQuick 2 [11] for improved topside electron density accuracy and thus better estimates of TEC [12, 13]. So, while large empirical models like IRI continue to improve, the number of these available options needed to address each domain and source of variance in the ionosphere also grows, and choosing the appropriate settings may be prohibitive without expert knowledge of each submodel.
- North America > United States > Colorado > Boulder County > Boulder (0.14)
- Oceania > Australia (0.04)
- Oceania > Guam (0.04)
- (13 more...)
- Energy (0.69)
- Government > Regional Government > North America Government > United States Government (0.67)